Introduction

Crime incidents have become widespread across the Texas in recent times.We have some demographics about the those offenses over the years.The dataset consists of 2383 observations with 47 variables,which describe the incidents occurred between in the city of Dallas. From the dataset, we get to see the behavior of police officers as well as the criminals wrt gender, race ,etc.

##Plot1 -Scatter Plot-officer race
officer_race_plot=ggplot(data, aes(x=OFFICER_RACE, y=OFFICER_GENDER)) + 
  geom_jitter(aes(color=OFFICER_RACE))+
  theme(axis.text.x = element_text(vjust = 0.5, hjust=0.5)) +
  labs(title = "Race and Gender of the Officers", x ="Officer Race", y="Officer Gender",fill="Officer Race")
##Plot2 -Scatter Plot-subject race
subject_race_plot=ggplot(data, aes(x=SUBJECT_RACE, y=SUBJECT_GENDER)) + 
  geom_jitter(aes(color=SUBJECT_RACE))+
  theme(axis.text.x = element_text(vjust = 0.5, hjust=0.5)) +
  labs(title = "Race and Gender of the Offenders", x ="Subject Race", y="Subject Gender",fill="Subject Race")
grid.arrange(officer_race_plot, subject_race_plot, ncol = 1)

Above scatter plots describes the majority races of the genders for Police officers and the subjects in Dallas. First scatter plot clearly describes that count of white police officers is high as compared to the all other races. Similarly, white female officers have also been hired majorly as compared to black and hispanic officers. The second scatter plot is exactly opposite to the first because, black male and female offenders were high as compared to all other races.

#PLOT3-SUB-OFFENSE PLOT
Subject_offense_plot=data %>%
  group_by(SUBJECT_OFFENSE) %>%
  filter(!is.na(SUBJECT_OFFENSE)) %>%
  summarise(Count = n()) %>%
  mutate(TotalCount = nrow(data)) %>%
  mutate(Percentage = (Count/TotalCount) * 100) %>%
  arrange(desc(Count)) %>%
  ungroup() %>%
  mutate(SUBJECT_OFFENSE = reorder(SUBJECT_OFFENSE,Count)) %>%
  
  ggplot(aes(x = SUBJECT_OFFENSE,y = Percentage)) +
  geom_bar(stat='identity',colour="white",fill="maroon") +
  geom_text(aes(x = SUBJECT_OFFENSE, y = 1, label = paste0("(",round(Percentage,2)," % )",sep="")),
            hjust=10, vjust=.5, size = 3.5, colour = 'black',fontface = 'bold') +
  labs(x = 'Offense', y = 'Percentage', title = 'Subject offense Percentage in Texas') +coord_flip() + theme_bw()
ggplotly(Subject_offense_plot)

Above bar chart clearly explains that Apprehension by Peace Officer Without Warrant (APOWW) stands highest with 15.15%. This is followed by not arresting the offenders, and public intoxication. Top five categories which covers maximum offense rates in Dallas are APOWW, No arrest, Public Intoxication, Assault,Warrant.

#PLOT 4 - Distribution plot
distribution_plot <- ggplot(data_year, aes(count)) + xlim(c(0,30))+
  geom_density(alpha = 0.5, colour = "black", fill ="red")+ 
  labs(x="Incident counts", y= "Density", title="Distribuion of incident rates") + theme_bw()
ggplotly(distribution_plot)

Distribution plot shows the overall distribution counts of the crimes. We do observe a right skewness in the incident count across the year. Incidents more than 20 per day are less obvious and we see a peak of the distribution at around 3 to 5 incidents reported day viewing the incidents in each division.

calender_plot= ggplot(data_month_day, aes(x= INCIDENT_DAY, y=INC_MONTH,fill = count)) + geom_tile( ) + 
  geom_text(aes(INCIDENT_DAY, INC_MONTH, label = count), color = "black", size = 4) + 
  scale_y_discrete("Months",labels=c("January","February", "March", "April","May", "June","July","August", "September","October","November","December")) + 
  labs(x="Days of Month", y= "Months", title=" Incident Rates across Days and Months")+
  scale_fill_gradientn(colours = c("#3794bf", "#FFFFFF", "#df8640"))
ggplotly(calender_plot)

From above calender plot, we can see there have been many incidents every day, every month. But in the month of February, on Sunday, we see highest count of incident occurred in dallas. The count of incident although remained between 10-30 all over the year.

#PLOT 6 - Officer rAce capturing Subject race
capture_plot= ggplot(data,aes(x = OFFICER_RACE,fill = SUBJECT_RACE)) + 
  geom_bar(position = "dodge") +labs(title= "Officer race capturing subject race",x="Officer Race",fill="Subject Race")
ggplotly(capture_plot)  

From the above graph, we see that officers of every race (white, black, hispanic) reported high count of Black criminals. This clearly describes racism in the city.

#PLOT 7- Subject description plot
desc_plot = ggplot(data,aes(x = SUBJECT_DESCRIPTION,fill = SUBJECT_RACE)) + 
  geom_bar(position = "dodge") + labs(title = "Subject arrest depending on Offense", x="Subject Description", fill="Subject Race")+
  theme(axis.text.x = element_text(size = 6,angle = 45,hjust = 0.5, vjust = 0.5))+facet_grid(~SUBJECT_WAS_ARRESTED)
ggplotly(desc_plot)    

So far we have seen, that criminals were majorly black and officers were white. Above graph states that there was high consumption of Alcohol/drugs by the criminals who got arrested. Some criminals were also said to be mentally unstable. The ratio of subjects getting arrested was very high when compared with the people who were not arrested, which means many people were not arrested even though they accounted for the crime.

#PLOT 8- BOX plot for years on force 
data$OFFICER_YEARS_ON_FORCE=as.numeric(data$OFFICER_YEARS_ON_FORCE)

box_plot= ggplot(data, aes(x = OFFICER_GENDER, y = OFFICER_YEARS_ON_FORCE, fill = OFFICER_GENDER))+
  geom_boxplot(show.legend = FALSE)+
  theme_minimal()+
  scale_fill_manual(values = c("orangered", "steelblue"))+scale_y_continuous(limits=c(0,30))+
  labs(title = "Box Plot for Officer's Service (in years)",x="Officer Gender",y="Officer years on force")
ggplotly(box_plot)

Above box plots shows that male police officers served more years on force as compared to female police officers. However, officers served their duty in the range of around 3-10 years maximum. Very few officers devoted their life to this duty.

#plot 9- Pie plot for division wise incidents
tab=table(data$DIVISION)
lbls <- paste(names(tab), "\n", tab, sep="")
color=c("green","blue","pink","yellow","maroon","skyblue","grey")
pie(tab, col=color, labels = lbls,main="Incident count per Texas Divisions")

Above pie plot indicates that highest number of incidents were reported in the Central division of Dallas city and northwest division had very less number of crime incidents. Rest of the divisions including North central,South central, North east, South east, South west reported around 300 crime incidents.

#PLOT 10- Pyramid plot for Reason for force
df_category = sort(table(data$REASON_FOR_FORCE),decreasing = TRUE)
df_category = data.frame(df_category)
#df_category = data.frame(df_category[df_category > 1000])
colnames(df_category) = c("Category", "Frequency")
df_category$Percentage = df_category$Frequency / sum(df_category$Frequency)
#view(df_category)
pyramid_plot <- df_category %>%
  hchart("pyramid", hcaes(x = Category, y = Frequency),name = "Reason for Force")
pyramid_plot

Above pyramid plot describes that Arresting criminal was the main reason for police force. Some criminals were causing danger to others and themselves , with active aggression.

#Plot 11- Lollipop plot for Officer injury types
df_category = sort(table(data$OFFICER_INJURY_TYPE),decreasing = TRUE)
df_category = data.frame(df_category)
#df_category = data.frame(df_category[df_category > 1000])
colnames(df_category) = c("Category", "Frequency")
df_category$Percentage = df_category$Frequency / sum(df_category$Frequency)
#view(df_category)

Officer_injury_plot= ggplot(df_category, aes(x=Category, y=Frequency)) + 
  geom_point(size=3) + geom_segment(aes(x=Category, xend=Category,y=0, yend=Frequency)) +  
  labs(title="Lollipop Chart for Officer Injuries",x="Officer Injuries") +
  theme(axis.text.x = element_text(size = 8,angle = 45,hjust = 0.5, vjust = 0.5))

ggplotly(Officer_injury_plot)

Above Lollipop graph explains that no injuries were caused to the police officers during crime incidents. However 100 police officers were harmed due to Abrasion, Laceration. Very few police officers were injured due to bite,pain and dizziness.

# Plot 12-Subject gender race-Arrest
gender_arrest= ggplot(data,aes(x=SUBJECT_RACE,y=""))+geom_bar(stat='identity',colour="white",fill="maroon")+facet_grid(SUBJECT_WAS_ARRESTED~SUBJECT_GENDER)+
  geom_bar(stat='identity',color = "maroon",width = 0.5) +theme_bw()+labs(title="Subject Arrest per Gender-Race",x="subject race",y="Subject Arrested")
ggplotly(gender_arrest)

Above graph shows that Black male caused major crime incidents and also got arrested. Similarly, Black Female were arrested for accounting for many crime incidents.

# Plot 13-Subject gender offense-Arrest
Offense_plot = ggplot(data,aes(x=SUBJECT_OFFENSE,y=""))+geom_bar(stat='identity',colour="white",fill="maroon")+facet_grid(SUBJECT_WAS_ARRESTED~SUBJECT_GENDER)+
  geom_bar(stat='identity',color = "maroon",width = 0.5)+coord_flip()+labs(title="Subject Arrest per Offense-Gender",x="Subject Offense",y="Subject Gender")+
  theme(axis.text.x = element_text(size = 4,hjust = 0.5, vjust = 0.5))
ggplotly(Offense_plot)

From above graph, we can see that male subjects got arrested highly due to public intoxication and APOWW and Warrant. Same is the situation for Female criminals. However people not getting arrested is pretty clear due to the high count of No arrest offense.

# Plot 14-officer hospitalization
plot = ggplot(data,aes(x=OFFICER_INJURY,y=""))+geom_bar(stat='identity',colour="white",fill="maroon")+facet_grid(OFFICER_HOSPITALIZATION~OFFICER_GENDER)+
  geom_bar(stat='identity',color = "maroon",width = 0.5)+labs(title = "Officer Hospitlization per Injury-Gender",x="Officer Injured",y="Officer HOspitalized")
ggplotly(plot)

From above graph, we see that most of the police officers were not injured in the crime incidents.Therefore no hospitalization was required. Although few police officers were injured who were not hospitalized.

shpMap <- ggplot(data = shp_df, aes(long,lat)) +
  geom_polygon(aes(group = group), fill="red") +
  coord_equal() +
  labs(x = "Longitude (Degrees)",
       y = "Latitude (Degrees)",
       title = "Map ",
       subtitle = "Map - Based on the Lat Long in Shape Files")
shpMap

Above map describes the crime incidents happened in different parts of the city of Dallas.

#PLOT16-SUBJECT-INJURY PLOT
SUBJECT_INJURY_TYPE_plot=data %>%
  group_by(SUBJECT_INJURY_TYPE) %>%
  filter(!is.na(SUBJECT_INJURY_TYPE)) %>%
  summarise(Count = n()) %>%
  mutate(TotalCount = nrow(data)) %>%
  mutate(Percentage = (Count/TotalCount) * 100) %>%
  arrange(desc(Count)) %>%
  ungroup() %>%
  mutate(SUBJECT_INJURY_TYPE = reorder(SUBJECT_INJURY_TYPE,Count)) %>%
  
  ggplot(aes(x = SUBJECT_INJURY_TYPE,y = Percentage)) +
  geom_bar(stat='identity',colour="white",fill="maroon") +
  geom_text(aes(x = SUBJECT_INJURY_TYPE, y = 1, label = paste0("(",round(Percentage,2)," % )",sep="")),
            hjust=0, vjust=.5, size = 3, colour = 'black',fontface = 'bold') +
  labs(x = 'Subject Injuries', y = 'Percentage', title = 'Subject injury Percentage in Texas') +coord_flip() + theme_bw()
ggplotly(SUBJECT_INJURY_TYPE_plot)

From above bar graph, we see that 70.08% of the criminal were not harmed. However few subjects were injured due to Abrasion/Scrape, Laceration/Cut,etc.

Conclusion

Altogether, we can say , black people caused most of the crime incidents which were reported by white police officers. However, Monitoring the emotional health, controlling the alcohol/drug abuse can be the potential solution to reduce the crimes in the city of Dallas. Moreover, Checking the severity of injuries and need of hospitalization can save life of many officer and subjects. Figuring out a way to stop the gender and race discrimination is the next challenge in the city of Dallas.